On-Line Viterbi Algorithm for Analysis of Long Biological Sequences

نویسندگان

  • Rastislav Srámek
  • Brona Brejová
  • Tomás Vinar
چکیده

Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Θ(m log n) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-line Viterbi Algorithm and Its Relationship to Random Walks

In this paper, we introduce the on-line Viterbi algorithm for decoding hidden Markov models (HMMs) in much smaller than linear space. Our analysis on two-state HMMs suggests that the expected maximum memory used to decode sequence of length n with m-state HMM can be as low as Θ(m log n), without a significant slow-down compared to the classical Viterbi algorithm. Classical Viterbi algorithm req...

متن کامل

Generalized Baum-Welch and Viterbi Algorithms Based on the Direct Dependency among Observations

The parameters of a Hidden Markov Model (HMM) are transition and emission probabilities‎. ‎Both can be estimated using the Baum-Welch algorithm‎. ‎The process of discovering the sequence of hidden states‎, ‎given the sequence of observations‎, ‎is performed by the Viterbi algorithm‎. ‎In both Baum-Welch and Viterbi algorithms‎, ‎it is assumed that...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Terminology of Combining the Sentences of Farsi Language with the Viterbi Algorithm and BI-GRAM Labeling

This paper, based on the Viterbi algorithm, selects the most likely combination of different wording from a variety of scenarios. In this regard, the Bi-gram and Unigram tags of each word, based on the letters forming the words, as well as the bigram and unigram labels After the breakdown into the composition or moment of transition from the decomposition to the combination obtained from th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007